Hashing: Resizing Strategies

PolyU DSAI2201 Lecture 13 2025-11-25

The Necessity of Rehashing

To guarantee the desirable O(1)O(1) average-case performance for search and insertion, the Load Factor (λ=N/M\lambda = N/M) must be strictly capped, where NN is the number of elements and MM is the table capacity.

If λ\lambda is allowed to grow indefinitely, collisions increase exponentially, and the average time complexity degrades toward O(N)O(N).

ConditionAction TriggeredImpact
λ<λmax\lambda < \lambda_{max}Standard O(1)O(1) insertionOptimal efficiency maintained.
λλmax\lambda \geq \lambda_{max}Resizing (Rehashing)Restores O(1)O(1) performance, but incurs temporary O(N)O(N) cost.

Common Thresholds (λmax\lambda_{max}): 0.70 to 0.75.

The Resizing Process

Resizing requires recalculating the hash index for every single item currently in the table, a process known as Rehashing.

  1. New Capacity Determination: Select a new capacity MnewM_{new}, usually double the current capacity (Mnew=2MM_{new} = 2M). This ensures the new λ\lambda is half of the critical threshold.
  2. Table Creation: Allocate a new hash table array of size MnewM_{new}.
  3. Item Iteration: Loop through all NN existing elements in the old table.
  4. Re-hashing: For each key kk, calculate the new index using the updated modulus: index=h(k)(modMnew) \text{index}' = h(k) \pmod{M_{new}}
  5. Insertion: Insert the item into the new table at index\text{index}'.

Note: Since the modulus changes, simply copying the array is impossible; every item must be re-inserted.

Amortized Cost

Why Resizing is O(N)O(N)

Resizing requires processing all NN elements, meaning the operation itself takes O(N)O(N) time, which temporarily violates the goal of O(1)O(1) insertion.

Amortized Analysis

We use Amortized Analysis to justify this cost. If the table doubles its size every time it resizes (exponential growth), the expensive O(N)O(N) cost is spread out over a large number of intervening O(1)O(1) insertions.

The average cost of any single insertion, factoring in the periodic O(N)O(N) resize, remains O(1)O(1).

📝 Interactive Quiz

1. A hash map has a capacity M=50M=50 and a max load factor λmax=0.6\lambda_{max} = 0.6. At what number of elements (NN) must a resize be triggered?

  • A) N=25N = 25
  • B) N=30N = 30
  • C) N=31N = 31
  • D) N=50N = 50

2. During a resize, why can't we simply copy elements from the old table to the new, larger table?

  • A) It is computationally slower than rehashing.
  • B) The hash index depends on the table's capacity (MM), which has changed.
  • C) It would cause memory fragmentation.
  • D) The old data is in a read-only state.

3. What is the amortized time complexity of an insertion in a hash table that doubles its size when resizing?

  • A) O(N)O(N)
  • B) O(1)O(1)
  • C) O(logN)O(\log N)
  • D) O(NlogN)O(N \log N)

4. What is the primary consequence of not resizing a hash table when its load factor becomes too high?

  • A) Performance degrades towards O(N)O(N) due to increased collisions.
  • B) The table will run out of memory immediately.
  • C) The hash function itself becomes invalid.
  • D) The table automatically deletes the oldest elements.